xxxxxxxxxx# DGMD S-14 Summer 2020 Final Project## Improving Interactions with Virtual Reality Objects in the AWS Cloud using IoT Wearable DevicesThis project is an exploration into the interactions with virtual reality objects hosted within the cloud using IoT wearable devices. We analyzed the effects by performing similar interactions on a physical cube and a virtual cube by building a wearable device using the STMicroelectronics Sensor Tile, hooking it up to a virtual object in the cloud, and then capturing data while interacting with that object. We built this project as a part of the DGMD S-14 class at Harvard.This project is an exploration into the interactions with virtual reality objects hosted within the cloud using IoT wearable devices. We analyzed the effects by performing similar interactions on a physical cube and a virtual cube by building a wearable device using the STMicroelectronics Sensor Tile, hooking it up to a virtual object in the cloud, and then capturing data while interacting with that object. We built this project as a part of the DGMD S-14 class at Harvard.
# Importing the necessary libraries# Built-in librariesimport os.pathimport globimport jsonfrom datetime import datetimeimport statistics# Data science librariesimport pandasimport numpyfrom sklearn.linear_model import LogisticRegressionfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import accuracy_score# Plotting librariesimport plotly.offline as pyoimport plotly.graph_objs as go# Jupyter Notebook librariesfrom IPython.display import displayxxxxxxxxxx### Physical Cube DataHere I am gathering, cleaning, and assembling the data from the interactions from the physical cube into a more usable form. I do this by creating a function that will clean and format all the data.Here I am gathering, cleaning, and assembling the data from the interactions from the physical cube into a more usable form. I do this by creating a function that will clean and format all the data.
# Physical cube dataPHYSICAL_CUBE_PATH = os.path.join("putty_data")# Conversion functiondef createDataFrameFromPuttyLogFile( filePath: str) -> pandas.DataFrame: """ This functions takes the file path of a putty log file containing sensor tile data and cleans and converts the data into a pandas.DataFrame, The data in the putty log file must be in the following form for each line: A_y: int, TimeStamp: hour:minutes:seconds.microseconds The data will be first cleaned by this function, removing any putty log headers, blank lines, and partial lines. Also, as a precaution, It will remove the final line even if it is valid to avoid any cases of partial lines being sent to the putty instance Args: filePath (str): the file path of the putty log fiel Returns: pandas.DataFrame: returns a dataframe in the following form: +-------+-------+-----------------+ | Index | A_y | TimeStamp | |-------+-------+-----------------| | 0 | (int) | (datetime.time) | +-------+-------+-----------------+ Raises: None """ # Opening the file and reading the file data with open(filePath) as puttyDataFile: puttyData = puttyDataFile.read() # Splitting the file data into lines of data puttyData = puttyData.split("\n") ############################## # Filtering Process ############################## # Filtering out the PuTTY log header puttyData = list(filter(lambda d: "~=" not in d, puttyData)) # Filtering out blank lines puttyData = list(filter(lambda d: d != "", puttyData)) # Filtering out lines with no comma delimiter puttyData = list(filter(lambda d: "," in d, puttyData)) # Filtering out partial lines (in this case ones that don't have both # the A_y and TimeStamp keys) puttyData = list(filter(lambda d: "A_y:" in d and "TimeStamp:" in d, puttyData)) # Removing the final line of data, as in some cases it may be a partial line # but not detected by the prior filters puttyData = puttyData[0:len(puttyData) - 1] ############################## # Data Manipulation Process ############################## # In this code I am doing several operations. I am looping through every single line of # putty data and for each line I creating a dictionary in the form of: # # { # A_y: int, # TimeStamp: datetime.time # } # # The operations I am performing are: # 1) Split the line on a comma, take the left half of the line, and parsing out the A_y # value and then finally converting it to an integer # 2) Split the line on a comma, take the right half, parse out the string that represents # the time, converting that to a datetime.datetime object via .strptime() calls, and then # from that datetime, converting it to a datetime.time object via the .time() calls. Since # the timestamps don't include day/month/year I am only taking the time portion of it. puttyData = [{ "A_y": int(line.split(", ")[0].split(":")[1].strip()), "TimeStamp": datetime.strptime( line.split(", ")[1].split(":", 1)[1].strip(), "%H:%M:%S.%f" ).time(), } for line in puttyData] # Finally, pipping the list of dictionaries into a pandas.DataFrame df = pandas.DataFrame(puttyData) return dfxxxxxxxxxxIn this step I break up all the data sets (1ft, 2ft, 3ft) into different lists of data frames for easier processing. You can see that 10 trails were run for each height.In this step I break up all the data sets (1ft, 2ft, 3ft) into different lists of data frames for easier processing. You can see that 10 trails were run for each height.
# Creating a list of dataframes for each of the three height throwing testsoneFootThrowDataFrameList = [ createDataFrameFromPuttyLogFile(filePath) for filePath in glob.glob(os.path.join(PHYSICAL_CUBE_PATH, "putty?.txt"))]print("Count of DataFrames for Physical 1ft Throws:", len(oneFootThrowDataFrameList))twoFootThrowDataFrameList = [ createDataFrameFromPuttyLogFile(filePath) for filePath in glob.glob(os.path.join(PHYSICAL_CUBE_PATH, "putty1?.txt"))]print("Count of DataFrames for Physical 2ft Throws:", len(twoFootThrowDataFrameList))threeFootThrowDataFrameList = [ createDataFrameFromPuttyLogFile(filePath) for filePath in glob.glob(os.path.join(PHYSICAL_CUBE_PATH, "putty2?.txt"))]print("Count of DataFrames for Physical 3ft Throws:", len(threeFootThrowDataFrameList))accYAdjustment = [ df["A_y"].median() for df in oneFootThrowDataFrameList + twoFootThrowDataFrameList + threeFootThrowDataFrameList]accYAdjustment = statistics.mean(accYAdjustment)xxxxxxxxxxHere I am creating a function that will take a data frame and generate a chart from it. The function will also display the chart in the notebook.Here I am creating a function that will take a data frame and generate a chart from it. The function will also display the chart in the notebook.
# Plotting and describing functiondef displayChartFromPuttyDataFrame( puttyDf: pandas.DataFrame, testType: int, testNumber: int) -> None: """ Displays a plotly chart and the descriptive statistics of a dataframe within a juypter notebook based on the data from putty. Args: puttyDf (pandas.DataFrame): a dataframe in the form of +-------+-------+-----------------+ | Index | A_y | TimeStamp | |-------+-------+-----------------| | 0 | (int) | (datetime.time) | +-------+-------+-----------------+ testType (int): represents the number of feet the cube was thrown for that test testNumber (int): the test number (used to determine chart title) """ # Plotting the plotly chart out. This chart uses 2 lines, one of them plots the following: # x=TimeStamp # y=Acc_Y # # The other chart plots a moving average of the same data (currently set to 10-point MA) fig = go.Figure({ # Data "data":[ go.Scatter( x=puttyDf["TimeStamp"], y=puttyDf["A_y"], name="Acc_Y", ), go.Scatter( x=puttyDf["TimeStamp"], y=puttyDf["A_y"].rolling(window=10).mean(), name="10-n MA Acc_Y", ), ], # Chart styling features "layout": go.Layout( title="Test #{} - {} Foot Throw".format(testNumber, testType), titlefont={ "size":32, "color":"#000000" }, plot_bgcolor="#ffffff", xaxis={ "gridcolor":"#dddddd", }, yaxis={ "gridcolor":"#dddddd", "range":[0,3000], } ) }) # Displaying the chart fig.show() # Displaying the dataframe inputting into this function display(df.describe())xxxxxxxxxxFor each of the data frames in each of the lists, I plot out a chart. In total there will be 30 charts displaying the timestamp on the X-axis and Y-acceleration on the Y-axis. I also display some descriptive statistics on of each individual data frame beneath the charts.For each of the data frames in each of the lists, I plot out a chart. In total there will be 30 charts displaying the timestamp on the X-axis and Y-acceleration on the Y-axis. I also display some descriptive statistics on of each individual data frame beneath the charts.
# Plotting out all charts from the three different test cases using the same# scale for each of the different test casesi = 0for df in oneFootThrowDataFrameList: i += 1 displayChartFromPuttyDataFrame(df, 1, i) for df in twoFootThrowDataFrameList: i += 1 displayChartFromPuttyDataFrame(df, 2, i) for df in threeFootThrowDataFrameList: i += 1 displayChartFromPuttyDataFrame(df, 3, i)xxxxxxxxxxHere I am manipulating the data so that I can get a list of dictionaries that I then convert into a data frame with the max values and a category for the throw height.Here I am manipulating the data so that I can get a list of dictionaries that I then convert into a data frame with the max values and a category for the throw height.
# Taking the max value of all 3 data sets and adding labels to themoneFootThrowMaxList = [{"Acc_Y_Max": df["A_y"].max(), "Throw_Foot_Height":1} for df in oneFootThrowDataFrameList]twoFootThrowMaxList = [{"Acc_Y_Max": df["A_y"].max(), "Throw_Foot_Height":2} for df in twoFootThrowDataFrameList]threeFootThrowMaxList = [{"Acc_Y_Max": df["A_y"].max(), "Throw_Foot_Height":3} for df in threeFootThrowDataFrameList]print(oneFootThrowMaxList[0])print(twoFootThrowMaxList[0])print(threeFootThrowMaxList[0])# Combining these into a dataframe for easier data manipulationdfThrowMaxes = pandas.DataFrame(oneFootThrowMaxList + twoFootThrowMaxList + threeFootThrowMaxList)display(dfThrowMaxes)xxxxxxxxxxI then break this data up into train and test arrays, and then train a logistic regression model with this data.I then break this data up into train and test arrays, and then train a logistic regression model with this data.
x
# X and y dataX = dfThrowMaxes["Acc_Y_Max"].array.to_numpy().reshape(-1, 1)y = dfThrowMaxes["Throw_Foot_Height"].to_numpy()#print(X)#print(y)# train, test splitX_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.20,)# Logistic regression to classifylogisticRegressionModel = LogisticRegression()logisticRegressionModel.fit( X=X_train, y=y_train,)HEIGHT_MAP = { 1:"Low (1ft) Throw", 2:"Middle (2ft) Throw", 3:"High (3ft) Throw",}# 1=small throw (1ft)# 2=medium throw (2ft)# 3=large throw (3ft)print("Value=", 2000, "; Prediction=", HEIGHT_MAP[logisticRegressionModel.predict(numpy.array([2000]).reshape(-1, 1))[0]])print("Value=", 2500, "; Prediction=", HEIGHT_MAP[logisticRegressionModel.predict(numpy.array([2500]).reshape(-1, 1))[0]])print("Value=", 3000, "; Prediction=", HEIGHT_MAP[logisticRegressionModel.predict(numpy.array([3000]).reshape(-1, 1))[0]])xxxxxxxxxx### VR Cube dataHere I am performing similar steps as above, except on the data gathered from the virtual cube. First I create a function that will convert the data into data frames.Here I am performing similar steps as above, except on the data gathered from the virtual cube. First I create a function that will convert the data into data frames.
# Collecting all the VR cube dataVIRTUAL_CUBE_PATH = os.path.join("vr_data")def createDataFrameFromVrLogFolder( folderPath: str) -> pandas.DataFrame: filePathList = glob.glob(os.path.join(folderPath, "*")) logFileDataList = list() for filePath in filePathList: with open(filePath) as vrLogFile: vrJsonData = json.loads(vrLogFile.read()) logFileDataList.append({ "A_y": int(vrJsonData["accY"]) * -1, "TimeStamp": datetime.strptime( vrJsonData["time"], "%HH%MM%SS%fSS" ).time(), }) # Finally, pipping the list of dictionaries into a pandas.DataFrame df = pandas.DataFrame(logFileDataList) df["A_y"] = df["A_y"] + (accYAdjustment - df["A_y"].median()) return df oneFootVrThrowDataFrameList = [ createDataFrameFromVrLogFolder(folderPath) for folderPath in glob.glob(os.path.join(VIRTUAL_CUBE_PATH, "low-*"))]twoFootVrThrowDataFrameList = [ createDataFrameFromVrLogFolder(folderPath) for folderPath in glob.glob(os.path.join(VIRTUAL_CUBE_PATH, "middle-*"))]threeFootVrThrowDataFrameList = [ createDataFrameFromVrLogFolder(folderPath) for folderPath in glob.glob(os.path.join(VIRTUAL_CUBE_PATH, "high-*"))]for df in oneFootVrThrowDataFrameList: display(df.head())xxxxxxxxxxNext, I re-use the same function created above to plot charts out for each of these data frames.Next, I re-use the same function created above to plot charts out for each of these data frames.
# Plotting out all charts from the three different test cases using the same# scale for each of the different test casesi = 0for df in oneFootVrThrowDataFrameList: i += 1 displayChartFromPuttyDataFrame(df, 1, i) for df in twoFootVrThrowDataFrameList: i += 1 displayChartFromPuttyDataFrame(df, 2, i) for df in threeFootVrThrowDataFrameList: i += 1 displayChartFromPuttyDataFrame(df, 3, i)xxxxxxxxxxFrom here, I get the same max value data I used to train the logistic regression model, except this time using the virtual cube data.From here, I get the same max value data I used to train the logistic regression model, except this time using the virtual cube data.
# Taking the max value of all 3 data sets and adding labels to themoneFootVrThrowMaxList = [{"Acc_Y_Max": df["A_y"].max(), "Throw_Foot_Height":1} for df in oneFootVrThrowDataFrameList]twoFootVrThrowMaxList = [{"Acc_Y_Max": df["A_y"].max(), "Throw_Foot_Height":2} for df in twoFootVrThrowDataFrameList]threeFootVrThrowMaxList = [{"Acc_Y_Max": df["A_y"].max(), "Throw_Foot_Height":3} for df in threeFootVrThrowDataFrameList]print(oneFootVrThrowMaxList[0])print(twoFootVrThrowMaxList[0])print(threeFootVrThrowMaxList[0])# Combining these into a dataframe for easier data manipulationdfVrThrowMaxes = pandas.DataFrame(oneFootVrThrowMaxList + twoFootVrThrowMaxList + threeFootVrThrowMaxList)display(dfVrThrowMaxes)xxxxxxxxxxI then gather some data to determine the accuracy of the model and use the virtual cube data in the logistic regression to see how it classifies the data.I then gather some data to determine the accuracy of the model and use the virtual cube data in the logistic regression to see how it classifies the data.
accuracyList = [ [], []]for i in range(len(dfVrThrowMaxes["Acc_Y_Max"])): accYMax = dfVrThrowMaxes["Acc_Y_Max"][i] throwHeight = dfVrThrowMaxes["Throw_Foot_Height"][i] print( "Value=", round(accYMax, 3), "| Trial Number=", str(i).zfill(2), "| Predicted Height=", HEIGHT_MAP[logisticRegressionModel.predict(numpy.array([accYMax]).reshape(-1, 1))[0]], "| Actual Height=", HEIGHT_MAP[throwHeight], "| Actual and Expected Match=", throwHeight == logisticRegressionModel.predict(numpy.array([accYMax]).reshape(-1, 1))[0] ) accuracyList[0].append(throwHeight) accuracyList[1].append(logisticRegressionModel.predict(numpy.array([accYMax]).reshape(-1, 1))[0])xxxxxxxxxxFinally, I calculate model accuracy.Finally, I calculate model accuracy.
# Calculating model accuracyprint("Accuracy:", accuracy_score( accuracyList[0], accuracyList[1]))xxxxxxxxxxNotice the accuracy number is quite low. In our case, as we were trying to use the model to see if the physics of the virtual cube had a similar feel to physics within he physical world, we can interpret this low model accuracy to indicate that the virtual cube does not currently match with natural physics. When examining the data, we can see that the physics in the virtual world were too sensitive, given that predictions were all the low throw.Notice the accuracy number is quite low. In our case, as we were trying to use the model to see if the physics of the virtual cube had a similar feel to physics within he physical world, we can interpret this low model accuracy to indicate that the virtual cube does not currently match with natural physics. When examining the data, we can see that the physics in the virtual world were too sensitive, given that predictions were all the low throw.
xxxxxxxxxx## ConclusionWe can see from the data above that the throw height logistic regression model was not able to predict the virtual throw heights correctly. Importantly, we can see that the throw heights predicted were consistently below the actual throw heights that occured when performing the trials. What this means is that if we are making physics within VR, we can use these types of logistic regression functions as a check to see how resposinve the virutal physics are relative to actual physics. Should we fork and modify the internals of the physics library used within the virtual scene, we could eventaully calbirate the virtual physics to roughly match the feel of actual physics, and test that it matches by applying the logistic regression function implemented in this project. As a result, we think that this method of analysis and the use of the sensor tile could be very valuable tools for anyone wanting to calibrate Virtual Reality applications so that they have a more natural feel to them.We can see from the data above that the throw height logistic regression model was not able to predict the virtual throw heights correctly. Importantly, we can see that the throw heights predicted were consistently below the actual throw heights that occured when performing the trials. What this means is that if we are making physics within VR, we can use these types of logistic regression functions as a check to see how resposinve the virutal physics are relative to actual physics. Should we fork and modify the internals of the physics library used within the virtual scene, we could eventaully calbirate the virtual physics to roughly match the feel of actual physics, and test that it matches by applying the logistic regression function implemented in this project. As a result, we think that this method of analysis and the use of the sensor tile could be very valuable tools for anyone wanting to calibrate Virtual Reality applications so that they have a more natural feel to them.